arXiv:1501.06484vl [cs.GT] 26 Jan 2015 


Symmetric Strategy Improvement 

Sven Schewe\ Ashutosh Trivedi^, and Thomas Varghese' 

‘Department of Computer Science, University of Liverpool 

‘‘Department of Computer Science and Engineering 
Indian Institute of Technology - Bombay 


Abstract 

Symmetry is inherent in the definition of most of the two-player zero-sum games, including 
parity, mean-payoff, and discounted-payoff games. It is therefore quite surprising that no symmeUic 
analysis techniques for these games exist. We develop a novel symmetric strategy improvement 
algorithm where, in each iteration, the strategies of both players are improved simultaneously. We 
show that symmetric strategy improvement defies Friedmann’s traps, which shook the belief in the 
potential of classic strategy improvement to be polynomial. 


1 Introduction 

We study tum-based graph games between two players—Player Min and Player Max—who take turns to 
move a token along the vertices of a coloured finite graph so as to optimise their adversarial objectives. 
Various classes of graph games are characterised by the objective of the players, for instance in parity 
games the objective is to optimise the parity of the dominating colour occurring infinitely often, while in 
discounted and mean-payoff games the objective is the discounted and limit-average sum of the colours. 
Solving graph games is the central and most expensive step in many model checking 
satisfiability checking |[^[T^l^l2^ . and synthesis |[22llT7l algorithms. More efficient algorithms for 
solving graph games will therefore foster the development of performant model checkers and contribute 
to bringing synthesis techniques to practice. 

Parity games enjoy a special status among graph games and the quest for performant algorithms 
llIllllllTlIiaiSaillalllBlIIlllISlIllSEIliniElIIlElllllliafor sowing them has therefore been 
an active field of research during the last decades. Traditional forward techniques (?a 0{n^^) lITSl for 
parity games with n positions and c colours), backward techniques (ss 0{rf) ll^ l9l ISTli. and their 
combination (?a 0(n3^) |j24|) provide good complexity bounds. However, these bounds are sharp, and 
techniques with good complexity bounds Il24l IT5]| frequently display their worst case complexity on 
practical examples. Strategy improvement algorithms lIT^ |23l 122 [3l |25l [TOl, on the other hand, are 
closely related to the Simplex algorithm for solving linear programming problems that perform well in 
practice. 

Classic strategy improvement algorithms are built around the existence of optimal positional strate¬ 
gies for both players. They start with an arbitrary positional strategy for a player and iteratively compute 
a better positional strategy in every step until the strategy cannot be further improved. Since there are 
only finitely many positional strategies in a finite graph, termination is guaranteed. The crucial step 
in a strategy improvement algorithm is to compute a better strategy from the current strategy. Given a 
current strategy ct of a player (say. Player Max), this step is performed by first computing the globally 
optimal counter strategy of the opponent (Player Min) and then computing the value of each vertex 
of the game restricted to the strategies a and r^. For the games under discussion (parity, discounted, 
and mean-payoff) both of these computations are simple and tractable. This value dictates potentially 
locally profitable changes or switches Prof (cr) that Player Max can make vis-a-vis his previous strategy 
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a. For the correctness of the strategy improvement algorithm it is required that such locally profitable 
changes imply a global improvement. The strategy of Player Max can then be updated according to a 
switching rule (akin to pivoting rule of the Simplex) in order to give an improved strategy. This has led 
to the following template for classic strategy improvement algorithms. 


Algorithm 1: Classic strategy improvement algorithm 

1 determine an optimal counter strategy for a 

2 evaluate the game for a and and determine the profitable changes Prof (a) for a 

3 update a by applying changes from Prof((T) to a 


A number of switching rules, including the ones inspired by Simplex pivoting rules, have been 
suggested for strategy improvement algorithms. The most widespread ones are to select changes for all 
game states where this is possible, choosing a combination of those with an optimal update guarantee, 
or to choose uniformly at random. For some classes of games, it is also possible to select an optimal 
combination of updates Il25l . There have also been suggestions to use more advanced randomisation 
techniques with sub-exponential - 2^^^^ - bounds ||3l and snare memory lITOll . Unfortunately, all of 
these techniques have been shown to be exponential in the size of the game ifTTlfT^fTSll . 

Classic strategy improvement algorithms treat the two players involved quite differently where at 
each iteration one player computes a globally optimal counter strategy, while the other player performs 
local updates. In contrast, a symmetric strategy improvement algorithm symmetrically improves the 
strategies of both players at the same time, and uses the finding fo guide fhe sfrafegy improvemenf. This 
suggesfs fhe following naive symmefric approach. 


Algorithm 2: Naive symmefric sfrafegy improvemenf algorifhm 

1 defermine r' = defermine a' = cr!^ 

2 updafe a fo a' updafe r fo r' 


This algorifhm has earlier been suggesfed by Condon Q where if was shown fhaf a repealed applica¬ 
tion of fhis updafe can lead fo cycles |'5!|. A problem wifh fhis naive approach is fhaf fhere is no guarantee 
fhaf fhe primed sfrafegies are generally beffer fhan fhe unprimed ones. Wifh hindsighf fhis is maybe nol 
very surprising, as in parficular no improvemenf in fhe evaluafion of running fhe game wifh cr', r' can 
be expecfed over running fhe game wifh ct, r, as an improvemenf for one player is on fhe expense of fhe 
olher. This observafion led fo fhe approach being abandoned. In fhis paper we propose fhe following 
more careful symmefric sfrafegy improvemenf algorifhm fhaf guaranlees improvemenls in each iferafion 
similar fo classic sfrafegy improvemenf. 


Algorithm 3: Symmetric strategy improvement algorithm 

1 determine determine a% 

2 determine Prof(cj) for a determine Prof(r) for r 

3 update a using Prof(iT) n aij: update r using Prof(T) n 


The main difference to classic strategy improvement approaches is that we exploit the strategy of 
the other player to inform the search for a good improvement step. In this algorithm we select only such 
updates to the two strategies that agree with the optimal counter strategy to the respective other’s strategy. 
We believe that this will provide a gradually improving advice function that will lead to few iterations. 
We support this assumption by showing that this algorithm suffices to escape the traps Friedmann has 
laid to establish lower bounds for different types of strategy improvement algorithms ifTTlfT^fTSl . 
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2 Preliminaries 


We focus on turn-based zero-sum games played between two players—Player Max and Player Min— 
over finite graphs. A game arena ^ is a tuple (14iaxj E, C, 4>) where {V = Vuax U 14iin, E) is a 
finite directed graph with the set of vertices V partitioned into a set pMax of vertices controlled by Player 
Max and a set 14iin of vertices controlled by Player Min, E C V x V is the set of edges, C is a set of 
colours, cj) : V ^ C is the colour mapping. We require that every vertex has at least one outgoing edge. 

A turn-based game over A is played between players by moving a token along the edges of the arena. 
A play of such a game starts by placing a token on some initial vertex vq G V. The player controlling 
this vertex then chooses a successor vertex vi such that (^ 0 ,^ 1 ) G E and the token is moved to this 
successor vertex. In the next turn the player controlling the vertex vi chooses the successor vertex V 2 
with (^ 1 ,^ 2 ) G E and the token is moved accordingly. Both players move the token over the arena in 
this manner and thus form a play of the game. Formally, a play of a game over A is an infinite sequence 
of vertices {vo,vi ,...) G such that, for all i > 0, we have that {vi, Vi+i) G E. We write Plays_ 4 (u) 
for the set of plays over A starting from vertex v £ V and Plays^ for the set of plays of the game. We 
omit the subscript when the arena is clear from the context. We extend the colour mapping (/) : V ^ C 
from vertices to plays by defining the mapping (j) : Plays —)> as (uq, ui,...) 1 — {(l){vo), cj){vi ),...). 

Definition 2.1 (Graph Games). A graph game ^ is a tuple (.4., a) such that A is an arena, rj : D 

is an evaluation function where D is the carrier set of a complete space, and A is a preference ordering 
over D. 

Example 2.2. Parity, mean-payoff and discounted payoff games are graph games (Al, ??, a) played 
on game arenas A = (14iax) ^inj ^,1^, i?^)- For mean payoff games the evaluation function is rj : 
(cq, Cl,...) I—)■ liminfj^oo j Oj; while for discounted payoff games with discount factor A G 
[0,1) it is r/ : (cq, ci,...) 1 —)■ with A as the natural order over the reals. For (max) parity 

games the evaluation function is rj : (cq, ci,...) 1 —)• limsupj_,.o^ Ci often used with a preference order 
Aparity where higher even colours are preferred over smaller even colours, even colours are preferred 
over odd colours, and smaller odd colours are preferred over higher odd colours. 

In the remainder of this paper, we will use parity games where every colour is unique, i.e., where (f> 
is injective. All parity games can be translated into such games as discussed in Il29l . For these games, 
we use a valuation function based on their progress measure. We define r] as (cq, ci,...) 1 —)■ (c, C, d), 
where c = lim supj_^oo ci is the dominant colour of the colour sequence, d = min{i G a; | Cj = c} is the 
index of the first occurrence of c, and C = {ci \ i < d, ci > c] is the set of colours that occur before the 
first occurrence of c. The preference order is defined as the following: we have (c', C, d') A (c, C, d) if 

• C Aparity C, 

• c=c', the highest colour h in the symmetric difference between C and C is even, and in C, 

• c=c', the highest colour h in the symmetric difference between C and C' is odd, and in C', 

• c = c' is even, C = C, and d < d', or 

• c = c' is odd, C = C, and d > d'. 

Definition 2.3 (Strategies). A strategy of Player Max is a function cr : l^*14rax —^ V such that 
(v,a(7rv)) G E for all -k £ V* and v £ 14iax- Similarly, a strategy of Player Min is a function 
r : V*VMm —>• V such that (v, a{7rv)) £ E for all vr G 1^* and v £ Vum- We write and T°° for the 
set of strategies of Player Max and Player Min, respectively. 

Definition 2.4 (Valuation). For a strategy pair {a, r) G x and an initial vertex v £ V we denote 
the unique play starting from the vertex v by tt{v, a, r) and we write valg(r;, a, r) for the value of the 
vertex v under the strategy pair (cr, r) defined as 

valq(r;,c7,T) = ?7((?i(7r(u, cr, r))). 
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Figure 1: Parity game arena with four vertices and unique colours. 

We also define the concept of the value of a strategy a G Tj°° and r G T°° as 

valc(u,(T)= inf valcfu, cr, r) and valc(u, r) = sup valc(u, fi, r). 

TeT°° 

We also extend the valuation for vertices to a valuation for the whole game by defining V dimensional 
vectors valq(cr) : v i—)• valq(r;, a) with the usual V dimensional partial order C, where val C val' if, and 
only if, val(u) ^ val'(u) holds for all v G V. 

Definition 2.5 (Positional Determinacy). We say that a strategy a G is memoryless or positional 
if it only depends on the last state, i.e. for all tt, tt' G V* and v G Fwax we have that cj(7ru) = a{TT'v). 
Thus, a positional strategy can be viewed as a function a : 14iax —V such that for all v G i4iax we 
have that {v, (t{v)) G E. The concept of positional strategies of Player Min is defined in an analogous 
manner. We write S and T for the set of positional strategies of Players Max and Min, respectively. We 
say that a game is positionally determined if: 

• valp(u, a) = minT-g'r valp(u, a, r) holds for all cr G S, 

• valq(u, r) = maxo-gs valq(u, u, r) holds for all r G T, 

• Existence of value: for all u G 1^ max^es valg(r;, cr) = minT-gy valp(r;, r) holds, and we use 
valp(u) to denote this value, and 

• Existence of positional optimal strategies: there is a pair Tmin, o'max of strategies such that, for 
all V ^ V, valq(u) = valg(u, (Tmax) = valq(u, Tmin) holds. Observe that for all cr G S and t 

we have that valg(cTmax) □ valq(cj) and valq(rmin) E valp(r). 

Observe that (first and second item above) that classes of games with positional strategies guarantee 
an optimal positional counter strategy for Player Min to all strategies cr G S of Player Max. We denote 
these strategies by r^. Similarly, we denote the optimal positional counter strategy for Player Max to a 
strategy r G T by of Player Min. While this counter strategy is not necessarily unique, we use the 
convention in all proofs that is always the same counter strategy for u G S, and is always the 
same counter strategy for t £ T. 

Example 2.6. Consider the parity game arena shown in Figure [T] We use circles for the vertices of 
Player Max and squares for Player Min. We label each vertex with its colour. Notice that a positional 
strategy can be depicted just by specifying an outgoing edge for all the vertices of a player. The positional 
strategies a of Player Max is depicted in blue and the positional strategy r of Player Min is depicted 
in red. In the example, val (1, u, r) = (1, 0,0), val(4, cr, r) = (3, {4}, 1), val(3, cr, r) = (3,0,0), and 
val(0,(T, r) = (0,0,0). 

2.1 Classic Strategy Improvement Algorithm 

As discussed in the introduction, classic strategy improvement algorithms work well for classes of games 
that are positionally determined. Moreover, the evaluation function should be such that one can easily 
identify the set Prof (cr) of profitable updates and reach an optimum exactly where there are no profitable 
updates. We formalise these prerequisites for a class of games to be good for strategy improvement 
algorithm in this section. 
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Definition 2.7 (Profitable Updates). For a strategy a G S, an edge {v,v') G E with v G 14rax is a 
profitable update if a' G S with a' : v ^ v' and a' : v” i-G (t{v'') for all v” ^ v has a strictly greater 
evaluation than a, valp(cr') □ valg((T). We write Prof((T) for the set of profitable updates. 

Example 2.8. In our example from Figure [T| r = is the optimal counter strategy to a, such that 
val(cT) = val(cj, r). Prof(cT) = {(3,4), (3,0)}, because both the successor to the left and the successor 
to the right have a better valuation, (3, {4}, 1) and (0,0,0), respectively, than the successor on the 
selected self-loop, (3,0,0). 

For a strategy a and a functional (right-unique) subsets P C Prof (cr) we define the strategy with 

: V ^ v' if {v, v') G P and : v i—;■ a{v) if there is no n' G U with (n, v') G P. For a class of 
graph games, profitable updates are combinable if, for all strategies a and all functional (right-unique) 
subsets P C Prof((T) we have that valq(cr^) □ valg((T). Moreover, we say that a class of graph games 
is maximum identifying if Prof(cr) = 0 valq(cj) = valq. Algorithm [^provides a generic template for 
strategy improvement algorithms. 


Algorithm 4: Classic strategy improvement algorithm 

1 Let (jQ be an arbitrary positional strategy. Set t := 0. 

2 If Prof (cTj) = 0 return ai 

3 cJi+i := ai^ for some functional subset P C Prof(cr). Set t := z -|- 1. go to 2. 


We say that a class of games is good for max strategy improvement if they are positionally deter¬ 
mined and have combinable and maximum identifying improvements. 

Theorem 2.9. If a class of games is good for max strategy improvement then Algorithm^terminates 
with an optimal strategy a (valq((T) = ya\g) for Player Max. 

As a remark, we can drop the combinability requirement while maintaining correctness when we 
restrict the updates to a single position, that is, when we require P to be singleton for every update. 
We call such strategy improvement algorithms slow, and a class of games good for slow max strategy 
improvement if it is maximum identifying and positionally determined. 

Theorem 2.10. If a class of games is positionally determined games with maximum identifying improve¬ 
ment then all slow strategy improvement algorithms terminate with an optimal strategy a (valg(cT) = 
va Iq) for Player Max. 

The proof for both theorems is the same. 

Proof. The strategy improvement algorithm will produce a sequence (Tq , cti , (T 2 ... of positional strate¬ 
gies with increasing quality valg(cjo) C valg(cri) C valq(cr 2 ) C .... As the set of positional strategies 
is finite, this chain must be finite. As the game is maximum identifying, the stopping condition provides 
optimality. □ 

Various concepts and results extend naturally for analogous claims about Player Min. We call a class 
of game good for strategy improvement if it is good for max strategy improvement and good for min 
strategy improvement. Parity games, mean payoff games, and discounted payoff games are all good for 
strategy improvement (for both players). Moreover, the calculation of Prof((T) is cheap in all of these 
instances, which makes them well suited for strategy improvement techniques. 


3 Symmetric Strategy Improvement Algorithm 


We first extend the termination argument for classic strategy improvement techniques (Theorems 2.9 
and 2.101 to symmetric strategy improvement given as Algorithm]^ 
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Algorithm 5: Symmetric strategy improvement algorithm 

1 Let (To and tq be arbitrary positional sttategies. set i := 0. 

2 Determine and 

3 CTj+i := Ui^ for P C Prof(cr) fl 

4 Ti+i := Ti^ for P C Prof(r) n 

5 if cji+i = (Ti and Tj+i = n return (cTj, Tj). 

6 set z := i + 1. go to 2. 


3.1 Correctness 


Lemma 3.1. The symmetric strategy improvement algorithm terminates for all classes of games that are 
good for strategy improvement. 


Proof We first observe that the algorithm yields a sequence (Tq , iti , cj 2 ,... of Player Max strategies 
for Q with improving values valp((To) P valp((Ti) □ valp((T 2 ) E ■ ■ where equality, valp(fTi) = 
valg((Ti+i), implies dj = (Tj+i. Similarly, for the sequence tq,ti,T 2 , ... of Player Min strategies for 
Q, the values valq(ro) E valq(ri) □ valq(r 2 ) E ..., improve (for Player Min), such that equality, 
valp(Tj) = valp(rj+j), implies Tj = Tj+i. As the number of values that can be taken is finite, eventually 
both values stabilise and the algorithm terminates. □ 

What remains to be shown is that the symmetric strategy improvement algorithm cannot terminate 
with an incorrect result. In order to show this, we first prove the weaker claim that it is optimal in 
E(d,r,<,r^) = (Crnax, fmin,L;',val) such that s' = {(n,d(n)) | v G I4iax} U {(?;,r(z;)) | v G 
Cmin}u{(n, I V G I4iax} U { [v, T^{v)) I V G Imin} is the subgame of Q whose edges are those 

defined by the four positional strategies, when it terminates with the strategy pair a, r. 


Lemma 3.2. When the symmetric strategy improvement algorithm terminates with the strategy pair a, r 
on games that are good for strategy improvement, then a and r are the optimal strategies for Players 
Max and Min, respectively, in Q{a,T, 


Proof For Q{(t, t, both update steps are not restricted: the changes Player Max can potentially 

select his updates from are the edges defined by at the vertices v G VEax where a and a'^ differ 
{a{v) / Consequently, Prof((T) = Prof((T) n af 

Thus, a = a' holds if, and only if, cr is the result of an update step when using classic strategy 
improvement in Gicr, r, alj., r^) when starting in cj. As game is maximum identifying, a is the optimal 
Player Max strategy for G{a, r, alj., rf). 

Likewise, the Player Min can potentially select every updates from r^, at vertices v G 14iin and we 
first get Prof (r) = Prof(r) n with the same argument. As the game is minimum identifying, r is the 
optimal Player Min strategy for Q(a, T,a^,T^). □ 


We are now in a position to expand the optimality in the subgame Q(cr, r, r^) 
to global optimality the valuation of these strategies for Q. 


from Lemma 


3.2 


Lemma 3.3. When the symmetric strategy improvement algorithm terminates with the strategy pair a, r 
on a game Q that is good for strategy improvement, then a is an optimal Player Max strategy and r an 
optimal Player Min strategy. 


Proof. Let a, r be the strategies returned by the symmetric strategy improvement algorithm for a game 
Q, and let C = G{cr, t, (T^, rf) denote the local game from Lemma 3.2 defined by them. Lemma 3.2 
has established optimality in C. Observing that the optimal responses in G to a and r, f 
respectively, are available in C, we first see that they are also optimal in C. Thus, we have 


and (t(:. 


val£((T) = val£((T,r^) = va\g{a,T^) and 
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val£(T) = val£«, r) = valg(cj^, r). 


Optimality in C then provides val£((T) = val£(r). Putting these three equations together, we get 
valg(cr,r^) = valg«,r). 

Taking into account that and are the optimal responses to a and r, respectively, in we expand 
this to valq □ valg(cT) = valg((T, r^) = ySi\g{a^,T) = valg(r) □ valg and get valg = valg(cT) = 
valg(r) = valg((T,r). □ 

The Lemmas in this subsection yield the following results. 

Theorem 3.4. The symmetric strategy improvement algorithm is correct for games that are good for 
strategy improvement. 

Theorem 3.5. The slow symmetric strategy improvement algorithm is correct for positionally deter¬ 
mined games that are maximum and minimum identifying. 

We implemented our symmetric strategy improvement algorithm based on the progress measures 
introduced by Vdge and Jurdzihski |[29l . The first step is to determine the valuation for the optimal 
counter strategies to and the valuations for a and r. 

Example 3.6. In our running example from Figure [T] we have discussed in the previous section that r 
is the optimal counter strategy and that Prof((T) = {(3,4), (3, 0)}. In the optimal counter strategy cj'^ 
to r. Player Max moves from 3 to 4, and we get val(l, r) = (1, 0,0), val(4, r) = (4,0,0), val(3, r) = 
(4,0,1), and val(0,r) = (0,0,0). Consequently, Prof(r) = {(4,1)}. For the update of a, we select 
the intersection of Prof((T) and af In our example, this is the edge from 3 to 4 (depicted in green). To 
update r, we select the intersection of Prof(r) and r^. In our example, this intersection is empty, as the 
current strategy r agrees with r^. 

3.2 A minor improvement on stopping criteria 

In this subsection, we look at a minor albeit natural improvement over Algorithm shown in Algo¬ 
rithm]^ There we used termination on both sides as a condition to terminate the algorithm. We could 
alternatively check if either player has reached an optimum. Once this is the case, we can return the 
optimal strategy and an optimal counter strategy to it. 


Algorithm 6: Symmetric strategy improvement algorithm (Improved Stopping criteria) 

1 Let (To and tq be arbitrary positional strategies, set z := 0. 

2 Determine a'L and 

' i D j 

3 if Prof((Ti) = 0 return ((T*, t^.)\ 

4 if Prof(ri) = 0 return {a^.,Ti)\ 

5 CTj+i := ai^ for P C Prof((T) Cl a'^.. 

6 Ti+i := Ti^ for P C Prof(r) n 

7 set z := z + 1. go to 2. 


The correctness of this stopping condition is provided by Theorems |2.9| and |2.10[ and checking this 
stopping condition is usually cheap: it suffices to check if Prof ((t) or Prof(r) is empty. This provides us 
with a small optimisation, as we can stop as soon as one of the strategies involved is optimal. However 
this small optimisation can only provide a small advantage. 


Theorem 3.7. The difference in the number of iterations of Algorithm^ and Algorithm^is at most 
linear in the number of states ofQ. 
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Figure 2: Friedmann’s lower bound game for the locally optimal strategy improvement algorithm 


Proof. Let a be an optimal strategy for Q. When starting with a strategy pair cr, tq for some strategy 
To of Player Min, we first construct the optimal counter strategies and cJtq ■ As a is optimal and Q 
maximum identifying, Prof((T) = 0, and strategy improvement will not change it. In particular, our 
algorithm will always provide a' = a, irrespective of the optimal counter strategy cr^. to a strategy r* 
of Player Min. This also implies that will not change. It is now easy to see that, unless t[ = Tj, 
Tj+i = r' differs from Tj in at least one decision, and it differs by adhering to at the positions where 
it differs (Vn G Vm\n . Ti{v) / ri+i(T) => rj+i(?;) = Such an update can happen at most once 

for each Player Min position. The argument for starting with an optimal strategy r of Player Min is 
similar. □ 

4 Friedmann’s Traps 

In a seminal work on the complexity of strategy improvement ifTTl . Friedmann uses a class of parity 
games called 1-sink parity games. These games contain a sink node with the weakest odd parity in a 
max-parity game. This sink node is reachable from every other node in the game and such a game is 
won by Player Min eventually. Figure]^ shows a lower bound game from ifTTI . 

In order to obtain an exponential lower bound for the classic strategy improvement algorithm with 
the locally optimising policy, these sink games implement a binary counter realised by a gadget called 
a cycle gate which consists of two components. With n cycle gates, we have a representation of the n 
bits for an n bit counter. The first component of a cycle gate is called a simple cycle. In Figure the 
three smaller boxes shown in yellow are the simple cycles of the game. These simple cycles encode 
the bits of the counter. The second component of the cycle gate gadget is called a deceleration lane. 
This structure serves to ensure that any profitable updates to strategies are postponed by cycling through 
seemingly more profitable improvements, in the order r, s, ai, 02 ,..., before eventually turning to e*. 
This structure is shown as a shaded blue rectangle in Figure]^ 

A simple cycle consists of exactly one Player Max controlled node d with a weak odd colour k and 
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one Player Min controlled node e with the even colour k + \. The Player Max node is also connected 
to some set of external nodes in the game and the Player Min node is connected to an output node with 
a high even colour on a path to the sink node. Given a strategy a, we say that a simple cycle is closed 
if we have an edge a{d) = e. Otherwise, we say that the simple cycle is open. Opening and closing 
cycles correspond to unsetting and setting bits. We then say a cycle gate is open or closed when its 
corresponding simple cycle is open or closed respectively. 

In these lower bound games, the simple cycles are connected to the deceleration lane in such a way 
that lower valued cycles have less edges entering the deceleration lane ensuring that lower open cycles 
close before higher open cycles. This allows the lesser significant bits to be set and reset before the 
higher significant bits. 

The deceleration lane hides sensible improvements, thus making the players take more iterations 
before taking the best improvement. It is then shown in IITll that incrementing a bit state always requires 
more than one strategy iteration in 4 different phases. This gadget thus counts an exponential number of 
improvement steps taken by the strategy improvement algorithm to flip n bits. For a detailed exposition 
of the gadget and the exponential lower bound construction, we refer the reader to ifTTl . 


4.1 Escaping the traps with symmetric strategy improvement 


We discuss the effect of symmetric strategy improvement on Friedmann’s traps, with a focus on the 
simple cycles. Simple cycles are the central component of the cycle gates and the heart of the lower 
bound proof. As described above, an n-bit counter is represented by n cycle gates, each cycle gate 
embedding a smaller simple cycle. These simple cycles are reused exponentially often to represent n 
bits. Both players have the choice to open or close the simple cycles. 

The optimal strategy of both players in the simple cycles of Figure is to turn right. (For Player 
Max, one could say that he wants to leave the cycle, and for Player Min, one could say that she wants 
to stay in it.) When the players agree to stay in the cycle. Player Max wins the parity game. In fact 
these are the only places where Player Max can win positionally in this parity game. When running the 
symmetric strategy improvement algorithm for Player Max, the optimal counter strategy by Player Min 
is to move to the right in simple cycles where Player Max is moving to the right, and to move left in all 
other simple cycles. 

As mentioned before, Friedmann IfTTl showed that, when looking at an abstraction of the Player Max 
strategy that only distinguishes the decisions of turning right or not turning right in the simple cycles, 
then they essentially behave like a binary counter that, with some delay (caused by the deceleration lane) 
will ‘count up’. More precisely, one step after the bit has been activated, all lower bits are reset. 

We now discuss how symmetric strategy improvement can beat this mechanism by taking the view 
of both players into account. For this, we consider a starting configuration, where Player Min moves to 
the right in the j most significant simple cycle positions, where j can be 0. Note that, when Player Min 
moves right in all of these positions, she has found her optimal strategy and we can invoke Theorem 


3.7 to show that the algorithm terminates in a linear number of steps—or simply stop when using the 


alternative stopping condition. 

The first observation is that changing the decision to moving left will not lead to an improvement, as 
it produces a winning cycle of a quality (leading even colour) higher than the quality of any cycle avail¬ 
able for Player Max under the current strategy of Player Min. Let us now consider the less significant 
position j + 1. First, we observe that moving to the right is a superior strategy. This can easily be seen: 
moving to the left produces a cycle with a dominating even colour and thus turns out to be winning for 
Player Max. Moving to the right in position j' + 1 and (by our assumption) all more significant positions 
removes this cycle and implies that the leading colour from this position is 1. This is clearly better for 
Player Min. If Player Min uses a strategy where j' + 1 is the most significant position where she decides 
to move to the left, we have the following case distinctions for Player Max’s strategy in this simple cycle: 

1. Player Max moves to the right in this simple cycle. Then moving to the right is also the optimal 
counter strategy for Player Min, and her strategy will be updated accordingly. 
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2. Player Max does not move right in this simple cycle with her current strategy a. Moving right in 
this simple cycle is among Prof((T), as one even colour is added to the set in the quality measure 
in the local comparison. It is also the choice for the optimal counter strategy to the current 
strategy r of Player Min, as this is the only way for Player Max to produce a valuation with 
the dominating even colour of this simple cycle, while to valuation with a higher even colour is 
possible. 

Taking these two cases into consideration. Player Min will move to the right in the j most significant 
positions after 2j improvement steps. When Player Max has found his optimal strategy, we can invoke 
Theorem |3.7| to show termination in linear steps for the algorithm. 

There are similar arguments for all kinds of traps that Friedmann has developed for strategy im¬ 
provement algorithms. We have not formalised these arguments on other instances, but provided the 
number of iterations needed by our symmetric strategy improvement algorithm for all of them in the 
next section. 

Note that the way in which Friedmann traps asymmetric strategy improvement has proven to be 
quite resistant to the improvement policy (snare lITOll . random facet globally optimal i25l . etc.). 

From the perspective of the traps, the different policies try to aim at a minor point in the mechanism 
of the traps, and this minor point is adjusted. The central mechanism, however is not affected. All of 
these examples have some variant of simple cycles at the heart of the counter and a deceleration lane to 
orchestrate the timely counting. 

Symmetric strategy improvement aims at the mechanism of the traps themselves. It seems that ex¬ 
amples that trap symmetric strategy improvement algorithms need to do more than just trapping both 
players (which could be done by copying the trap with inverse roles), they need to trap them simulta¬ 
neously. It is not likely to find a proof that such traps do not exist, as this would imply a proof that 
symmetric strategy improvement solves parity (or, depending on the proof, mean or discounted payoff) 
games in polynomial time. But it seems that such traps would need a different structure. A further 
difference to asymmetric strategy improvement is that the deceleration lane ceases to work. 

Taking into account that finding traps for asymmetric strategy improvement took decades and was 
very insightful, this looks like an interesting challenge for future research. 

5 Experimental Results 

We have implemented the symmetric strategy improvement algorithm for parity games and compared it 
with the standard strategy improvement algorithm with the popular locally optimising and other switch¬ 
ing rules. To generate various examples we used the tools steadygame and stratimprgen that 
comes as a part of the parity game solver collection PGSOLVER flSl . We have compared the perfor¬ 
mance of our algorith on parity games with 100 positions (see appendix) and found that the locally 
optimising policy outperforms other switching rules. We therefore compare our symmetric strategy 
improvement algorithm with the locally optimising strategy improvement below. 

Since every iteration of both algorithms is rather similar—one iteration of our symmetric strategy 
improvement algorithm essentially runs two copies of an iteration of a classical strategy improvement 
algorithm—and can be performed in polynomial time, the key data to compare these algorithms is the 
number of iterations taken by both algorithms. 

Symmetric strategy improvement will often rule out improvements at individual positions: it disre¬ 
gards profitable changes of Player Max and Min if they do not comply with aj: and r^, respectively. It is 
well known that considering fewer updates can lead to a significant increase in the number of updates on 
random examples and benchmarks. An algorithm based on the random-facet method ifT^ I^. e.g., needs 
around a hundred iterations on the random examples with 100 positions we have drawn, simply because 
it updates only a single position at a time. The same holds for a random-edge policy where only a single 
position is updated. The figures for these two methods are given in the appendix. 

It is therefore good news that symmetric strategy improvement does not display a similar weakness. 


10 




• 10 ^ 

1 


0.8 


0.6 g 


0.4 


0.2 


0 


0 2 4 6 8 10 12 

number of counter bits in Friedmann’s trap 


I 

3 

e 


Figure 3: These plots compare the performance of the symmetric strategy improvement algorithm (data 
points in cyan circles) with standard strategy improvement using the locally optimising policy rule 
(data points in orange squares). The plot on the left side is for random examples generated using the 
steadygame 1000 2 4 3 5 6 command, while the plot on the right is for Friedmann’s trap from 
the previous section generated by the command St rat imprgen -pg switchallsubexp i. 


It even uses less updates when compared to classic strategy improvement with the popular locally opti¬ 
mising and locally random policy rules. Note also that having less updates can lead to a faster evaluation 
of the update, because unchanged parts do not need to be re-evaluated |!3j|. 

As shown in Figure the symmetric strategy improvement algorithm not only performs better (on 
average) in comparison with the traditional strategy improvement algorithm with the locally optimising 
policy rule, but also avoids Friedmann’s traps for the strategy improvement algorithm. The following 
table shows the performance of symmetric strategy improvement algorithm for Friedmann’s traps for 
other common switching rules. It is clear that our algorithm is not exponential for these classes of 
examples. 


Switch Rule 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

Cunningham 

2 

6 

9 

12 

15 

18 

21 

24 

27 

30 

C unninghamS ubexp 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

FeamleySubexp 

4 

7 

11 

13 

17 

21 

25 

29 

33 

37 

Friedmanns ubexp 

4 

9 

13 

15 

19 

23 

27 

31 

35 

39 

RandomEdgeExpTest 

1 

2 

2 

2 

2 

2 

2 

2 

2 

2 

RandomEacetS ubexp 

1 

2 

7 

9 

11 

13 

15 

17 

19 

21 

S witchAllBestExp 

4 

5 

8 

11 

12 

13 

15 

17 

18 

19 

S witchAllBestSubExp 

5 

7 

9 

11 

13 

15 

17 

19 

21 

23 

S witchAllSubExp 

3 

5 

7 

9 

10 

11 

12 

13 

14 

15 

S witch AllExp 

3 

4 

6 

8 

10 

11 

12 

14 

16 

18 

ZadehExp 

- 

6 

10 

14 

18 

21 

25 

28 

32 

35 

ZadehSubexp 

5 

9 

13 

16 

20 

23 

27 

30 

34 

37 


6 Discussion 

We have introduced symmetric approaches to strategy improvement, where the players take inspiration 
from the respective other’s strategy when improving theirs. This creates a rather moderate overhead, 
where each step is at most twice as expensive as a normal improvement step. For this moderate price, 
we have shown that we can break the traps Friedmann has introduced to establish exponential bounds 
for the different update policies in classic strategy improvement ifTTlfT^fTSl . 
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In hindsight, attacking a symmetric problem with a symmetric approach seems so natural, that it is 
quite surprising that it has not been attempted immediately. There are, however, good reasons for this, 
but one should also consent that the claim is not entirely true: the concurrent update to the respective 
optimal counter strategy has been considered quite early ifTTlfT^fT^ . but was dismissed, because it can 
lead to cycles [5l. 

The first reason is therefore that it was folklore that symmetric strategy improvement does not work. 
The second reason is that the argument for the techniques that we have developed in this paper would 
have been restricted to beauty until some of the appeal of classic strategy improvement was caught in 
Friedmann’s traps. Friedmann himself, however, remained optimistic: 

We think that the strategy iteration still is a promising candidate for a polynomial time 
algorithm, however it may be necessary to alter more of it than just the improvement policy. 

This is precisely, what the introduction of symmetry and co-improvement tries to do. 
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Figure 4: These plots compare the performance of the symmetric strategy improvement algorithm (data 
points in cyan circles) with standard strategy improvement using the locally optimising policy rule (data 
points in orange squares), random-edge switching rule (data points in red triangles), random-facet rule 
(data points in blue triangles), and switch-half rule (data point in green triangles). These plots are for 
random examples generated using the steadygame 10024356 command from PGSolver. 
The results from randomized switching rules (random-edge, random-facet, and switch-half) presented 
here are taken as average number of iterations over four executions of the corresponding algorithms. 
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